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callee (non-leaf): 


S.64 

sp,off(dp) 

L.64 

sp.off(dp) 

S.64 

link,off(sp) 

S.64 

dp,off(sp) 

... (using dp) 


L64 

link,off(sp) 

L.64 

dp,off(sp) 

L.64 

sp.off(dp) 

B.DOWN 

link 

callee (leaf): 




... (using dp) 
B.DOWN 

The callee, if it uses a stack for local variable allocs 
the value of the sp passed to it, except;*^ regi 
memory, 

Pipeline Organizatiqm ^ 

Terpsichore performs all 
precise exceptions 
subsequent discussioj 
correctly. Howevej 
achieved only bv 
pipeline. In t' 
implementai 
implementatioi 


ran^Qt necessarily trust 
^gWameters held in 


n#-"%-one, in-order, with 
^de%which ignores the 
^|ions will still perform 
erpsichore processor is 
) th& characteristics of the 
ejaegres of all Terpsichore 
Aptoices for specific 


:veral instructions in each clock 
asl^Jijpi types, one instruction of each type 
^cle.^lHe ordering required is A, L, E, S, B; in 
^Sls,% f^gister-to-re^l«r address calculation, a memory load, a register- 
lifter data^calculation, a memory store, and a branch. Because of the 
liization of the pipeline, each of these instructions may be serially dependent. 
Instructions of type E include the fixed-point execute-phase instructions as well as 
floating-point and digital signal processing instructions. We call this form of 
pipeline organization "super-string," 4 because of the ability to issue a string of 
dependent instructions in a single clock cycle, as distinguished from super-scalar 
or super-pipelined organizations, which can only issue sets of independent 
instructions. 

These instructions take from two to five cycles of latency to execute, and a branch 
prediction mechanism is used to keep the pipeline filled. The diagram below 
shows a box for the interval between issue of each instruction and the completion. 


4 Readers with a background in theoretical physics may have seen this term in an other, 
unrelated, context. 
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Bold letters mark the critical latency paths of the instructions, that is, the periods 
between the required availability of the source registers and the earliest 
availability of the result registers. The A-L critical latency path is a special case, in 
which the result of the A instruction may be used as the base register of the L 
instruction without penalty. E instructions may require additional cycles of latency 
for certain operations, such as fixed-point multiply and divide, floating-point and 
digital signal processing operations. 



iment t\|he organization defined above, 
„ eline^^ervice load operations may be 
thej^i&ie, in which A, L and B type 
d u-mjthe back of the pipeline, in which E, 
llw decoupling occurs at the point at which 
uory%re referenced; similarly, a FIFO that is 
^ruction fetch unit decouples instruction cache references from 
: of the pipeline shown above, The depth of the FIFO structures is 
plementation-dependeiit, Le. not fixed by the architecture. 

The diagram below indicates why we call this pipeline organization feature 
"super-spring," an extension of our super-string organization. 


^SAjpej^s pring pipeline 
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With the super-spring organization, the latency of load instructions can be 
extended, so execute instructions are deferred until the results of the load are 
available. Nevertheless, the execution unit still processes instructions in normal 
order, and provides precise exceptions. 


A 

A 



L 

L 

L ' 


E 


E 


S 

"s ' 

S 


B 

B 




A 

A 



L 

L 
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E 

E 
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S 

S 

S 


B 

B 




laye^^'rMch ^strull ^ns y ^M^o relies upon branch 
f piplBne^i^&round^nfl^ditional and conditional 
^ctio^mecna^m is tuned for optimizing 
: exprelkfrequent alternatives, and will 
les wh^k)2xecuting conditional branches 
ten .^^^Ptaken. For such cases, the use of 
•^*\or of the use of set on compare and 
>erformance. 
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Studies of the dynamic distribution of Terpsichore instructions on the various 
benchmark suites indicate that the most frequently-issued instruction classes are 
load instructions and execute instructions. In a high-performance Terpsichore 
implementation, it is advantageous to consider execution pipelines in which the 
ability to target the machine resources toward issuing load and execute 
instructions is increased. 

One of the means to increase the ability to issue execute-class instructions is to 
provide the means to issue two execute instructions in a single-issue string. The 
execution unit actually requires several distinct resources, so by partitioning these 
resources, the issue capability can be increased without increasing the number of 
functional units, other than the increased register file read and write ports. The 
partitioning favored for the initial implementation places all instructions that 


For evaluation only - 40 - 


microunity confidential 


Terpsichore System Architecture 


REDACTED 


Instruction Set 

All instructions are 32 bits in size, and use the high order 8 bits to specify a major 
operation code. 

31 24 23 0 

1 major I 


The major field is filled with a value specified by the following 



major operation code field values 


For the major operation field values A.MINOR, L. MINOR, E.MINOR, F,16, F.32, 
F.64, F.128, GF.16, GF.32, GF.64, G.l, G.2, G.4, G.8, G.16, G.32, G.64, S.MINOR 
and B.MINOR, the lowest-order six bits in the instruction specify a minor 
operation code: 

31 24 23 65 0 

| major 1 other I minor I 


5 B)ank table entries cause the Reserved Instruction exception to occur. 
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The minor field is filled with a value from one of the following tables: 


0 

— s 

8 

16 
AAND 

■ 24 

32 


48 " 

66 - 




AOR 






2 



AXOR 






3 



AANDN 






4 

AADD 

ASUB 

ANAND 





/ASHLI 

5 



ANOR 






6 



AXNOR 





f ASHRI 

7 



AORN 





\ AUSHRI 




r 

nmor op 

eration c 

ode field values' 

for GF.s 

ze 


0 

0 

-SSI!!— 

GSETNE" 


16 
GAND 
GOR 



f -ft 

<RpTL 

48 

56 

2 
3 

GSETGE 


GXOR 
GANDN 




GUMUL 
GDiV 
GUDIV 

GCOMPHESSI 
GEXPANDI 
GUEXPANDI 

5 

GADD 

GSUB 

GNAND 
GNOR 






6 
7 

GSETUL 
GSETU6E 


GXNOR 
GORN 

/ G8HR 

Y QU8RR 




S GSHffl 
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FloatingPoInt(minor.op, major.size, minor, round, ra, rb, re) 
F.UNARY.N, F.UNARY.T, F.UNARY.F, F.UNARY.C. 
F. UNARY, F..UNARY.X: % 
case unary of 

F.ABS, F.NEG, F.SQR, 

F.HALF, F.SINGLE, F.DOUBLE, F.QUAD. • 

F.1NT, F. FLOAT: 

FloatingPointUnary(unary.op, major.size, minor.round, 
ra, rc) 

others: 

raise Reservedlnstruction 

endcase 
others: 

raise Reservedlnstruction 

endcase 

GMULADD1, GMULADD2, GMULADD4, 
GMULADD8, GMULADD16, GMULAfll 
GUMULADD2, GUMULADD4. 
GUMULADD8, GUMULADD16^iyMUl 
GMUX, GMUXGATHER, GSl 
GroupTernary(ma or.sl 
G.EXTRACT.I, G.EXTRAG& 

GroupExtractlmm 
G.1, G.2, G.4, G.&,4' 


case minor 
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G.UMUL, 
'R, G.XNOR, G.ORN, 
IET.UI., G.SET.UGE, 
CYPRESS, G.EXPAND, 


GFllJLA§D32,<i^LADD84 ( 
. GFN^^B32, G1JVIULSUB64: 
^ ipFloatingPointternaryCmajor, ra, rb, rc.rd) 
GF.16.13F.32, GF.64, GF.128: 
case minor of 

GF.ADD.N, GF.SUB.N, GF.MUL.N, GF.DIV.N, 
GF.ADD.T, GF.SUB.T, GF.MULT, GF.DIV.T, 
GF.ADD.F, GF.SUB.F. GF.MUL.F, GF.DIV.F, 
GF.ADD.C, GF.SUB.C, GF.MULC, GF.DIV.C, 
GF.ADD, GF.SUB, GF.MUL, GF.DIV, 
GF.ADD.X, GF.SUB.X, GF.MULX, GF.DIV.X, 
GF.SET.E, GF.SET.NE, GF.SET.UE, GF.SET.NUE, 
GF.SET.NUGE, GF.SET.UGE, GF.SET.UL, GF.SET.NUL, 
GF.SET.E.X, GF.SET.NE.X, GF.SET.UE.X, GF.SET.NUE.X, 
GF.SET.L.X, GF.SET.NL.X, GF.SET.NGE.X, GF.SET.GE.X: 

GroupRoatingPoint(minor.op, major.size, minor.round, ra, rb, rc) 
GF.UNARY.N, GF.UNARY.T, GF.UNARY.F, GF.UNARY.C, 
_^ GF.UNARY, GF.UNARY.X: 

case unary of ' f~ — 

GF.ABS, GF.NEG, GF.SQR, MU Q023259 
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Group 

These operations take two values from a pair of registers, perform operations on 
groups of bits in the operands, and place the concatenated results in a register , 

Operation QQtiQS 
lG.ADD.2 



10 G.AND does not require a size specification, and is encoded as G.AND.1. 

n G.ANDN does not require a size specification, and is encoded as G.ANDN.l. G.ANDN is 

used as the encoding for G.SET.L.l, and by reversing the operands, for G.SET.UL.l. 
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12 G.GATHER128 is encoded as G.GATHER.1 
13 G.MUL.l is used as the encoding for G.UMUL.l. 

"G.NAND does not require a size specification, and is encoded as G.NAND.l. 

"G.NOR does not require a size specification, and is encoded as G.NOR.1. 

I6 G.OR does not require a size specification, and is encoded as G.OR.l. • ■ [ 

17 G.ORN does not require a size specification, and is encoded as G.ORN.l. G.ORN is used as 

the encoding for G.SET.UGE.l, and by reversing the operands, for G.SET.GE.l. 

18 G.SCATTER128 is encoded as G.SCAXCESJ ^ 

Highly Confidential 

For evaluation only -97- microunity confidential 


Terpsichore System Architecture 


REDACTED 



G.U.SHR.32 


G.U.SHR.64 
G.XNOR 19 


G.XOR» 


Group unsigned shift right quadlets 
Group unsigned shift right octlets 


Group exclusive-nor 
Group exclusive-or 
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l9 G.XNOR does not require a size specification, and is encoded as G.XNOR.1. G.XNOR is 
used as the encoding for G.SET.E.l. 

20 G.XOR does not require a size specification, and is encoded as G.XORJ. G.XOR is used as 
the encoding for G.ADD.l, G.SUB.l and G.SET.NE.1. 
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class 

op ■ ■ ■ ' " | 

size. 

linear 

ADD 

.248 16 32' 64 

bitwise 

AND ANDN NAND NOR 
OR • ORN XNOR XOR 


signed multiply 

MUL 

1 2 4 8 16 32 64 

unsigned 
multiply 

U.MUL 

2 4 8 16 32 64 

signed divide 

DIV 

It 

unsigned 
divide 

U.DIV 


rearrange 

COPY DEAL 
SWAP SHUFFLE 



GATHER SCATTER ,4 

<4 8 16 32 64 

galois field 

POLY £j 

J* 2#| 16 32 64 

precision 

COMPRESS E^iD^^I 

H #J^8 16 32 64 

shift 

SHL SHR^USHR k 

4 8 16 32 64 



:rs ra and rb. The specified 
register rc. 


3 Op <jfl 

G.MUL, Q.U.MUL, G.DIV, G.U.DIV: • 
a <- REG [raj 
b<-REG[rb] 

G.ADD, G.SUB, G.SET.L, G.SET.UL, G.SET.E, G.SET.NE, G.SET.GE, G.SET.UGE,- 
G.AND, G.OR, G.XOR, G.ANDN, G.NAND, G.NOR, G.XNOR, G.ORN 
G. GATHER, G. SCATTER: 

a<-REG[ra] 

b<-REG[rb] 

G.COMPRESS, G.SHI, G.SHR, G.U.SHR, G.POLY: 
' a<-REG[ra] 
b.<-REG[rb] 
G.EXPAND, G.U.EXPAND: 

a<-REG[ra] , 
b*-REG[rb] •/ MU 0023311 

G.COPY, G.SWAP, G.DEAL, G.SHUFFLE: 
a <- REG[ra] II REG[rb] 
endcase ' — n 
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case op of 
G.ADD: 

for i <- 0 to 1 28-size by size 

Cj+size-1..i<- ai+size-1-.i + bj +s ize-1..i 
endfor 
G.MUL: 

for i <- 0 to 64-slze by size 

C2*(i+size)-l..2"i «- (a 9 ize-1 SIZ8 H aatee-1+i..i) * (b s ize-1 s,Z8 » bstee-1+Li) 
endfor 
G.U.MUL: 

for i <- 0 to 64-sfze by size. 

C2-(i + size)-l..z*i «- (0 s,ze II asfce-1+i..i) * (0 s ™ II b si 1 
endfor 
G.DIV: 

if (b = 0) or ( (a = (1I10 63 )) and (b = 1 64 ) ) then 
c <- undefined & 

if b = 0 then 

o <- undent 

else 

™ >/(0!l 



G.POLY: 

p[0]<-a 

for i <— 1 to size 

Pli] <- (PD-1]0 ? (O 64 If b) : 0128) xor (p[i . 1]o „ p[M J 127 .,) 

endfor 

c <- p[size] 
G.GATHER: 

for k <- 0 to 1 28-size by size 


for i <- k to k+size-1 by 1 
if a-, then 
cj <- bj 
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endif 
endfor 
j <- k+size-1 

for i «- k+size-1 to k by -1 
if ~aj then 
cj <- bj 
J <— 1- 1 

endif 
endfor 
endfor 
G.SCATTER: 

for k <- 0 to 128-size by size 
J«-k 

for i <- k to k+stze-1 by 1 
if a; then 
Ci<-b 



3 to 64-sllg$f size 
r ci + i +S j Ze+ si2e-1..i + i <- osee-Kb&Csize-D)!, tMbB _ xi || 0 b&(size-D 
endfor 
G.SHL: 

for i <- 0 to 128-size by size 

Ci+size-1..i«- ai+size-l-(b&(size-1))..i II 
endfor 
G.SHR: 

for i <- 0 to 128-size by size 

Ci + size-1..i <- aj +s j ze . 1 b& ( size -l>ll ai +8 ize-i..i + (ba,(8ize-in 
endfor 
G.U.SHR: 

for i <- 0 to 128-size by size t 

c-j +siz e-i..i <- 0b&Is.ze-1)|| a i+siZ e-i..i + <b&(size-1)) 
endfor 

G.COPY: s 

for i <- o to 128-size by size 
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endfor 
G.SWAP: 

for i <- 0 to 128-size by size 

Cj+size-1..i «- ai27-i..128-size-i 
endfor 
G.DEAL: 

for i *- 0 to 128-size by size 
J<-fl5..o » 0 1 )+(i6?size:0) 
Ci+size-1..i<-aj+size-1..j 
endfor , 
G.SHUFFLE: 


for i «- 0 to 128-size by size 

i *- (0 1 II i 6 ..i)+((i&size) ? (64-{0 1 II size B ..i)) 

C|+8fze-1..i«- aj+size-1..j 
endfor 


REG[rc] «- c 

enddef 

Exceptions 
Reserved Instruction 


J^t^^^^* ^^^^^ dff^ 
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